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Abstract 

We propose a new model for approximating spatiotemporal noise covariance for 
use in MEG/EEG source analysis. Our model is an extension of an existing model 
[1,2] that uses a single Kronecker product of a pair of matrices - temporal and 
spatial covariance; we employ a series of Kronecker products in order to construct 
a better approximation of the full covariance. In contrast to the single-pair model 
that assumes the same temporal structure for all spatial components, the proposed 
model allows for distinct, independent time courses at each spatial component. This 
model better describes spatially and temporally correlated background activity. At 
the same time, inversion of the model is fast which makes it useful in the inverse 
analysis. We have explored two versions of the model. One is based on orthogonal 
spatial components of the background. The other, more general model, is based 
on independent spatial components. Performance of the new and previous models 
is compared in inverse solutions to a large number of single dipole problems with 
simulated time courses and background from authentic MEG data. 
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1 Introduction 



The objective of magnetoencephalography (MEG)/ electroencephalography 
(EEG) source localization is to infer active brain regions from measurements 
outside of the human head. Often, for example in evoked response experiments, 
the data from individual stimulus trials are averaged, time-locked to the stim- 
ulus presentation. This averaged post-stimulus signal is compared with the 
statistical properties of background noise (averaged signal far from the stim- 
ulus time) and this difference is used to infer the location and time-courses of 
neural activity that is, at least on average, generated in response to the given 
stimulus. Because such inferences are based on differences between signal and 
background, it is important to characterize the statistical properties of the 
background as accurately as possible. 

The Central Limit Theorem lends support to the common assumption that 
the averaged background data is Gaussian distributed, even though the dis- 
tribution of single trial background may not be Gaussian. The log likelihood 
function is a common mathematical expression quantifying the likelihood that 
a given model (e.g. of neural current) could have produced the measured data. 
For Gaussian, zero-mean averaged background noise, the log likelihood func- 
tion is given by: 



h wt , - J C k/ (x')j(x',t')dx' . 

(1) 

Here b k t are the averaged measurements (the data being analyzed) at sensor 
k and time t; j(x,t) is the neural (source) current distribution over space 
x and time t; Ck(x) is the forward or lead field for sensor k - the linear 
operator which connects source currents to predicted measurements in the 
absence of noise. COV is the covariance of the averaged background activity, 
which describes second order statistical properties of the MEG/EEG data in 
the absence of sources. There are a number of different inverse algorithms in 
use e.g. [3,4,5,6,7,8] but most use the likelihood formulation in some way. In all 
cases, accurate covariance is required to solve the source localization problem 
reliably. The covariance is commonly taken to be diagonal, even though there is 
ample evidence that the background is correlated over space and time [9,10,11]. 
This may adversely affect the results of inverse calculations, for example by 
biasing the locations of reconstructed sources [12]. 



b kt - J C k (x)j(x,t)dx COV kt ) k , t , 



The sample covariance of the averaged background data is related to the 
sample covariance of the single-trial background data by a simple expression: 
COV = -gCOV. Here, M is the number of trials being averaged and COV 
is the sample covariance of the single-trial background data. This relation as- 
sumes the trials are independent draws from the single-trial distribution but 
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does not assume a particular form for this distribution. The task of estimat- 
ing the average background covariance may be accomplished by estimating 
the single-trial background covariance and scaling it by the number of trials 
in the average. 

Given that MEG/EEG is measured in M trials, on L sensors and in C time 
samples, let E m be the L x C single trial noise matrix at trial m. In this 
case, the conventional way to estimate the full covariance matrix of dimension 
N = LC for the averaged noise is by 



i M 

CQV= M(M-1) 5> ec ( E J - vec{E)){vec{E m ) - vec(E)) T , (2) 

i M 

E = ^£ E - (3) 

m=l 

where vec(E) is all the columns of E stacked in a vector. In order to simplify 
notation in this paper we use symbol M both for the number of available noise 
samples and for the number of stimuli over which averaging is done. Note that 
in the general case these numbers may not be the same. 

There are a number of reasons why this estimate of the full covariance is 
difficult to use and why an approximation is needed. First, for modern multi- 
sensor detectors sufficient experimental data is rarely available to adequately 
estimate the large number of parameters present in the full covariance ma- 
trix. For example, for 35 time samples and 121 channels and considering the 
fact that the covariance matrix is symmetric, 8, 969, 730 parameters should 
be determined ((121 * 35)((121 * 35) + l)/2). This far exceeds the amount of 
data typically available. Second, because the spatiotemporal noise covariance 
matrix is so large, a tremendous amount of memory is required for its storage. 
Third, this full covariance is almost impossible to handle in the likelihood for- 
mulation, since the computation time of calculating the inverse still renders 
the task infeasible in most interesting cases, even if it was possible to estimate 
the covariance with the given data. A naive algorithm for matrix inversion 
takes 0(N 3 ) time where N is the dimensionality of the matrix. Though there 
are some improvements over this result for large matrices [13,14], the problem 
is overwhelming for typical computing equipment and for interesting values 
of N. To summarize: it is almost aways impossible to estimate the full spa- 
tiotemporal covariance due to lack of data; in those cases when the estimation 
is possible, the inversion is computationally hard; and in any case the amount 
of storage required is high. Due to all these difficulties with the estimation 
of the full spatiotemporal covariance, an accurate approximation is needed. 
In addition to addressing the above three problems mentioned above a good 
approximation should capture as much of the structure in the noise as possible 
and reduce the errors in inverse solutions. 
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In this paper we describe three different models of increasing complexity and 
different ways to estimate them (Section 2). The first is the widely used diag- 
onal approximation, which has no spatial or temporal correlation and whose 
diagonal elements consist of sensor noise variances (Section 2.1). The second 
model is a Kronecker product approximation of a temporal covariance and 
a spatial covariance under the assumption that a temporal covariance and a 
spatial covariance are independent and separable [1]. Parameterized [1] and 
unparameterized [2] variations of this approximation are available. This paper 
deals only with the unparameterized approximation estimated in the maxi- 
mum likelihood framework suggested in [2] (Section 2.2). Two novel and more 
complex models that do not employ the assumption of independence and sep- 
arability of temporal and spatial covariance are proposed in Section 2.3 and 
Section 2.4. The first model is a multi-pair Kronecker product approximation 
based on orthogonal spatial basis and the second one is a variant based on 
independent spatial basis. 

Since the goal of better noise characterization by covariance modelling is to 
improve results of inverse algorithms, we have chosen to test and compare all 
models described in this paper by using them in algorithms for source local- 
ization. Performance of the approximations in reconstructing dipole locations 
and time courses is tested using a large number of simulated single dipole 
data sets constructed from empirical data for background noise and simulated 
dipole sources covering a wide range of locations and orientations (Section 3). 



2 Models and Methods 

This section describes all covariance models compared in this paper. Among 
them, two novel and relatively more complex models are introduced and ex- 
plained. Models are presented in the order of increasing complexity. 

2.1 Diagonal approximation 

It is common when solving the inverse problem to model covariance as a 
diagonal or even as an identity matrix. In the diagonal case the approximated 
full spatiotemporal covariance is expressed as COV oc T <g> S, where T is the 
temporal covariance which is taken to be the identity; S is the diagonal spatial 
covariance with elements of the diagonal being sensor variances; and <g> is the 
Kronecker product. This model is easy to estimate. It has only L parameters, 
where L is the number of sensors. At the same time this simple model is better 
than not using any covariance estimate at all as in the Ordinary Least Squares 
(OLS) approach. 
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2.2 Unparameterized Kronecker product model 



De Munck et al. [2] proposed a more complex model, based on the assumption 
that temporal and spatial covariance of the background in MEG and EEG arc 
independent and separable. This allows one to cast spatiotemporal covariance 
in the form that uses the Kronecker product: 



COV w T <g> S 



tuS ti2S • • • ticS 

t2lS t22S • • • t2C*S 



tciS tc?.S 



C2>- 



(4) 



Temporal covariance is a C x C matrix T and spatial covariance is an L x L 
matrix S, where C is the number of time samples and L is the number of 
sensors. In [2] the temporal covariance matrix T is normalized to one. Both 
matrices are unparameterized. Note, that a single set of variances needs to 
be divided between the spatial and temporal matrices and that normalization 
of T takes away a degree of freedom. In this case the number of parameters 
that need to be estimated equals L(L + l)/2 + C{C — l)/2. This is less than 
all elements of both matrices because both are symmetric. These parameters 
are estimated from the data using a Maximum Likelihood (ML) method as 
described in [2]. In deriving these ML estimators, the single trial background 
data was assumed to be Gaussian with a covariance that factored into sep- 
arate spatial and temporal matrices. This Gaussian distribution was used to 
construct the likelihood of the single trial background data, from which esti- 
mators were derived for the parameters of the spatial and temporal covariance 
matrices that maximized this likelihood. The resulting estimators are a cou- 
pled set of equations between spatial and temporal parameters that may be 
solved using an iterative technique. 



2.3 Orthogonal basis multi-pair model 



In the Kronecker product model above, all spatial components of the back- 
ground have the same temporal covariance structure. This statement describes 
the main assumption of the single-pair Kronecker product model. In order to 
motivate multi-pair models, we re-write the single-pair Kronecker product 
model in the following way. First, perform a spectral decomposition of the 
spatial covariance: 

T<g>XVS<, (5) 
1=1 
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where S l = vivj is an orthonormal basis component represented as a singular 
matrix. This form is one of the conventional ways of writing a spectral repre- 
sentation of a matrix. Using the identity A £g> (B + C) = A®B + A®C, the 
expression (5) can be represented as: 

L L 

^T<gxj'<S' = ^(T , T(8)5' (6) 

1=1 i=i 

The left hand side of the equation (6) makes it obvious that each spatial com- 
ponent has the same temporal covariance. The contribution of each temporal 
covariance is weighted by the variance of the corresponding orthonormal spa- 
tial component, as seen from the right hand side of (6). In the final sum such 
weighting of temporal covariances makes no difference since they all are the 
same. 

In a more realistic case it is easy to picture a situation where several noise 
generators having distinct spatial patterns (or, similarly, belonging to separate 
spatial components) also have different and independent temporal structures. 
They can have a focal or a distributed spatial pattern that can be described by 
the corresponding spatial component. Furthermore, some background sources 
of noise that are external to the head origin can also be captured by such a 
model. 

We propose an alternative multi-pair model of the spatiotemporal noise co- 
variance as a series of orthonormal spatial components S l of the background 
data and their corresponding temporal covariance matrices T l expressed as: 

L 

cov ^J2 r> ( 7 ) 

i=i 

The model is build on the following assumptions: 

Al Spatiotemporal noise is generated by L spatially orthogonal generators 
which do not change their location during the period of interest. 

A2 Each spatial component has a time course independent from those of 
other components. 

A3 Superposition of time courses of each component measured at the sensors 
has a Gaussian distribution with zero mean. 

Let's assume for now that we are given the orthonormal components S l . (We 
will discuss a way to obtain those later in the section.) As in (5) the orthonor- 
mal basis spans the whole sensors space having dimension L. The next task 
is to estimate temporal covariances T l of each component. This can be done 
due to the assumption A3 that the background is Gaussian and using the 
Maximal Likelihood estimation as demonstrated in the Appendix A. Assume 
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that we have M single trial noise samples E m - aLxC matrix with L number 
of sensors and C number of time points. Now we can summarize orthogonal 
basis multi-pair model in the following way: 

COV w cov = Tl ® s l , 

1=1 

-i M 
m=l 

S l =v lV J. (8) 

Expression (8) represents a model for the averaged noise hence the normaliz- 
ing factor in the estimate of T l is squared. Note that although the form of 
expression (7) may suggest that it is a summation of L single pair models, 
this is not so. Components S l are not covariances and furthermore they are 
singular because they represent a dimension in an L dimensional space. 

Another important feature expected of a covariance model to make it useful 
in the analysis is a computationally manageable inverse. The inversion of this 
multi-pair model can be expressed by (see Appendix B for details) 



COV 1 = Y J {T l Y 1 ® S l ■ (9) 

i=i 

This inversion is easily manageable. Its running time is 0(LC 3 ), similar to 
that for the one-pair models. 

While this model has far fewer free parameters than the full spatiotemporal 
covariance estimate (2) the number of parameters is larger than that of models 
described above (though not by a great amount). The L spatial components S l 
consist of L(L — 1)/2 parameters plus LC(C+ 1)/2 parameters for all temporal 
covariances. The increase in parameters provides an increase of expressive 
power that is greater than of the single pair model (4). Furthermore, it is free 
from assumptions of identical character of the noise over all spatial locations 
and hence may capture more information about structure of the noise. 

We derived this model without any assumptions about how spatial orthogo- 
nal components were obtained. That means that in principle any set of such 
components should keep the validity of the derivation and retain the invert- 
ibility of the model. However, since we assume that the background has the 
distribution close to the Gaussian then it is natural to use singular value de- 
composition (SVD) to obtain orthogonal spatial components. In this work we 
use SVD to estimate the spatial orthogonal components for the orthogonal 
basis multi-pair model. SVD of the noise data collected in a matrix A by 



7 



stacking M single trial spatiotemporal samples E m looks like A = USV T . U 
is a CM x L orthogonal matrix; S is an L x L diagonal matrix with singular 
values of A A; as diagonal elements; and V is an L x L orthogonal matrix of 
spatial components. Each row of V T is a spatial component v t that is used to 
form orthogonal spatial components S l of our model (7). 

Very often SVD is used for dimensionality reduction when only the most signif- 
icant singular values are accounted for and all other values are neglected [15]. 
In this application we do not use SVD for this purpose. Dimensionality re- 
duction, i.e. approximation of the full covariance is already performed in (7) 
based on the stated assumptions. The sole purpose of SVD in estimation of 
this model is finding spatial orthogonal components and their corresponding 
time courses. 

2.4 Independent basis multi-pair model 

The model introduced in the previous section has greater expressive power 
than the models presented previously. Nevertheless, its requirement of orthog- 
onality may be inappropriate for MEG/EEG data. In the general case noise 
sources do not have to be spatially orthogonal to one another. In an effort 
to remove this restriction we suggest a generalization, that does not restrict 
spatial components to be orthogonal. Instead we consider independent spatial 
components. 

Let us assume that we are given independent spatial components of the back- 
ground. Denote each independent component as wi, its corresponding spatial 
matrix as 1Z 1 = wiwf and the matrix, in which the I th row is the wi vector, 
as W _1 . In this framework the only assumption needed is independence of 
each wi from the rest of the components. Estimation of temporal covariances 
of each independent spatial component can be performed in the Maximum 
Likelihood framework (see Appendix A). 

COV « COV = Y, T l ® K l 

1=1 

1 M 
m=l 

7l l =wiwJ. (10) 

Here wi stands for the /-th row vector of W _1 as described above and M 2 is 
due to the averaged noise modeling. 

The model is more general compared to the one using orthogonal bases from 
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Section 2.3. It can be constructed in terms of any spatially independent basis 
set, which also includes orthogonal sets. The number of free parameters of the 
model has increased to L 2 + LC(C — l)/2. At the same time its inversion is 
still manageable and can be expressed as (see Appendix B for details) 

COV 1 = E(T')- 1 ® [WW T U'WW r ], (11) 
i=i 

This makes the model useful in the analysis. However, the computational 
cost of this operation is greater than that for the orthogonal basis multi- 
pair model. It is now 0(L(C 3 + 2L 3 ) + L 3 ), where the 2L 3 part comes from 
repeated multiplication of independent components 1Z 1 by the combination 
WW T . The additional L 3 component is the time to calculate this combination. 
Nevertheless in localization algorithms, like in the one we used in this study, 
the covariance matrix does not change across iterations and its inverse needs 
to be computed only once. The increase in computation does not significantly 
influence total running time of inverse procedures and thus represents a small 
tradeoff for the generality of the model over the one using an orthogonal basis. 

The above reasoning applies to any independent components of the measured 
signal. It is crucial to adopt an optimal way of estimating such components 
(with respect to some criterion). A class of algorithms that uses an optimality 
criterion to find independent components arises in the field of blind source 
separation: the class of Independent Component Analysis (ICA) algorithms. 
We adopt such an algorithm for this work as a tool to discover independent 
spatial components. 

Different ICA algorithms have been applied to MEG [16,17,18,19,20] and 
EEG [21,22,23] data. Among them the most widely used are second order 
blind identification (SOBI) [24], Infomax [25], and fICA [26]. Our goal in this 
paper is not to evaluate performance of different ICA algorithms in the sug- 
gested framework; that is work for a subsequent study. Rather, here we chose 
one algorithm to demonstrate the feasibility and applicability of ICA for find- 
ing independent spatial components H l for modeling background noise. A 
preliminary study compared Infomax and SOBI algorithms. Initial estimates 
suggested that SOBI gave a better approximation to the structure of the full 
spatiotemporal covariance. In this paper we have used a SOBI algorithm im- 
plementation provided by ICALAB toolbox [27] with the default settings. The 
SOBI algorithm was also used in [18,17] and the motivation for applying it to 
MEG data is given there. 

Assume that we have M times single trial noise data and that an ICA algo- 
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rithm is applied to this data constructed in a martix A: 



w 



;i2) 



In the above expression W is an L x L unmixing matrix obtained as a result of 
an ICA algorithm, the U m form aCMxi matrix of independent components 
U. This W matrix is the matrix used in (10) and (11). 

Although ICA is widely used for dimensionality reduction, we note that in 
this application no dimensionality reduction is performed based on ICA; the 
technique is used exclusively as a discovery tool. 



3 Comparing performance of covariance models 



Many different approaches can be used to evaluate how well different models 
approximate the full spatiotemporal covariance. Calculating some norm of the 
difference between true covariance and its approximating model is a common 
measure, for instance the Frobenius norm. Kullback-Leibler divergence [28] is 
also a popular measure. A scatterplot can be a good "visual" tool for com- 
parison as described for example in [1]. However, different measures can give 
different and even contradictory results when comparing several models. The 
ultimate goal of modeling full spatiotemporal covariance is to achieve better 
inverse performance. Thus, evaluation of model performance in an inverse algo- 
rithm is adopted as the main comparison tool in this study. A large number of 
single dipole data sets were analyzed using different background noise models. 
Each data set was constructed using empirical whole head MEG background 
data together with simulated dipole sources. 

The empirical MEG data and anatomical information used for performance 
comparisons were acquired in the following experiment: 

Electrical median nerve stimulation at the motor twitch threshold was applied 
using a block design of 30s on, 30s off for a total of 10 blocks for each of 8 
runs. The stimulus alternated across runs, with four runs each of left side 
stimulation and of right side stimulation. The ISI (interstimulus interval) was 
randomized between 0.25 and 0.75s (Fig. 1). Since there is no stim for long 
periods this design provides a large sample of brain noise data. Which might 
be useful in other noise studies. Data were collected on a 4D Neuroimaging 
Neuromag-122 whole-head gradiometer system with 122 channels [29]. The 
experiment used a male subject, age 38, sampling rate was set to 1000Hz. In 
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this paper data from sensor 51 was not used since its output had too many 
artifacts, leaving only 121 useable channels. 



>^ 

1 2 
E 



Stimulus type 



0.0-10° 2.0-10 5 4.0-10 5 6.0-10 5 8.0-10 5 1.0-10 6 1.2-10 6 
time samples 



Fig. 1. The manner in which stimuli were applied in the experiment, x-axis shows 
time samples and y-axis - stimulus types, where type 1 - right hand, 2 - no stimulus, 
3 - left hand. 



This somatosensory data set contained high frequency transient signals result- 
ing from the electrical stimulus. These transients are distorted when filtered 
with a linear filter, due to ringing effects and adversely affect nearby data, 
including the expected early response at about 20ms post-stimulus. To avoid 
this and still be able to remove low frequency drifts, a median filter was used 
as follows. First, the signal was filtered with a median filter of window size set 
to obtain 1Hz low pass filter (1000 samples in our case). Second, the result 
from the previous step was subtracted from initial measurements to obtain 
1Hz high-pass filtered signal. The large 60Hz noise and harmonics in the data 
were reduced but not removed by replacing the points in the power spectrum 
in the data near 60Hz and harmonics with values that interpolated between 
adjacent power spectrum points. 

Noise samples of 35ms latency were extracted from the prestimulus area of 
right side stimulation no earlier than 300ms after the previous stimulus was 
applied. Since filtering can affect temporal covariance it is important to es- 
timate covariance after filtering has been applied. All covariance models in 
this section were estimated using this continuous background data, scaled ap- 
propriately to obtain covariances of the noise averaged over 602 samples. All 
the covariance models yielded similar sensor variances, differences were only 
observed in the correlation structure. 



Continuous noise samples were combined in a way that allowed for averaging 
over 602 independent samples. This approach supplied six different average 
noise data sets. Different signal to noise ratios (SNR) for single dipole problems 
were obtained by scaling these noise samples before combining them with 
simulated data. The measure of SNR used in this paper was constructed by 
squaring all values of the signal vector and then adding them together (inner 
product of the signal vector) and dividing this value by squared and summed 
values of the noise vector; a square root was taken from the obtained ratio. 
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The locations and orientations of fifty dipoles were drawn at random from 
the gray matter voxels that had been tagged from the subject's anatomical 
MRI used in the empirical MEG experiment. In order to mitigate the effect 
of depth vs. strength that would complicate the interpretation of results, the 
area from which random locations were drawn was constrained to be further 
than five centimeters from the head center (Fig. 2). 




Fig. 2. Dipoles randomly scattered over the cortex. These dipoles give fairly uniform 
coverage of the cortex and with different orientations create many possible realistic 
sources. 

Each single dipole problem consisting of 121 sensor values over 35ms (121x35 
matrix) was constructed in the following way: 

(1) For each dipole in the set of 50 a sinusoidal time course was used. 

(2) This dipole was projected to the sensor space using Sarvas' spherical head 
model [30]. 

(3) To the simulated signal one of the six noise sample sets was added. Noise 
was appropriately scaled in advance according to the intended SNR. 

The total number of single dipole problems run through the inverse solution 
routine was 2400. This number combines 50 dipoles with 6 noise kinds for 
each and 8 SNR values (0.3,0.4,0.5,0.7,1.0,1.5,2.0,3.0). Figure 3 demonstrates 
how different the noise samples were and what diversity was introduced by 
changing SNR. 

For each data set we estimated the location, orientation and time course of 
a single dipole that maximized the likelihood, using the different background 
covariance models. In these trials the location and orientation of the dipole 
did not vary over time. Initial attempts to solve for these parameters using 
an optimization algorithm were plagued by local minima problems, which is 
common in dipole inverse algorithms [31,32]. The degree of these local minima 
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5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 

ms ms ms 



SNR 0.3 SNR 1.0 SNR 3.0 

Fig. 3. Three noise sample sets and three signal to noise ratios for one of the dipoles. 
Columns have the same signal to noise ratio and rows have the same noise instance 
added to the simulated dipole. 

errors confounded the errors between background covariance models. To mit- 
igate this confound we employed a sampling algorithm using Markov Chain 
Monte Carlo (MCMC) [33,34,5], which sampled the location and orientation 
parameters from the likelihood using the maximum likelihood time course val- 
ues for a given set of location and orientation parameters. From these samples 
we then calculated the mean values of the parameters and used this as our 
estimate of the maximum likelihood result. This reduced the local minima 
problems but did not eliminate them. To further reduce the local minima 
effects we ran multiple MCMC samplings for each data set and chose the re- 
sults that had the highest likelihood. We are confident that the results from 
this set of procedures primarily reflect the errors associated with the different 
background covariance models. 

There are 300 results for each SNR value for each model. Figure 4 shows a 
histogram of location errors for the diagonal model with SNR of 1.0. The 
shape of this distribution (non-Gaussian with large tails), is typical for all of 
the models and SNR values. From each of these distributions we calculated 
the point on the error axis below which 90% of the probability mass is con- 
centrated. Figures 5(a) and 5(b) show these 90% error plots for location and 
time courses as a function of SNR for each background covariance model. Here 
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the location error was calculated as the root mean squared error and the time 
course error was calculated as the root of the squared error averaged over time. 
The two multi-pair models had the lowest errors, followed by the two single 
pair models with slightly higher errors. Finally, the diagonal model had by far 
the largest errors. 

0.3 i , 1 , 1 1 




2 4 6 8 10 

location error (mm) 



Fig. 4. Location error histogram for the case of using the diagonal covariance ap- 
proximation at SNR 1.0 




SNR 



(a) location errors (b) time course errors 

Fig. 5. Combined 90% location error measure (a) and combined 90% time course 
errors (b) for a series of single dipole problems. 
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4 Discussion 



This work, like previous investigations, underscores the value of a suitable 
models of brain and other noise sources for accurate localization of neural 
sources. The most sophisticated models we tested gave the best performance, 
with improved estimation of both location and timecourse of the target neural 
source. 

We also note the relatively small improvement in the multi-pair models over 
the single pair models. This is especially interesting given that the compu- 
tational burden, both for estimating the parameters of the models and in 
implementing them in the inverse calculations, is relatively small for the sin- 
gle pair model and much higher for multi-pair models. Thus these findings 
suggest that a significant gain in performance may be had with a relatively 
small computational cost by using a single pair model. However, more work 
needs to be done to investigate the validity and generality of this finding. For 
example, these results are from a single subject, using single dipole sources and 
over a limited temporal range. Future work would investigate whether these 
results hold over multiple subjects, with more complex source configurations 
and for a more extended temporal range. It is possible that noise components 
will be more variable or more complex if these conditions change. 

The assumption that each noise generator contributing to the measured back- 
ground does not change its spatial location is supported by observations of 
stability of the brain function at the resting state [35]. Spatial diversity of 
the background signal can be observed, for example, as nonuniform spatial 
distribution of the alpha rhythm over the cortex. Moreover, the background 
may not have many generators with distinct independent time courses. In the 
data examined here the temporal covariances across components were rela- 
tively similar in the multi-pair models. This is a possible explanation of why 
the multi-pair models yield relatively little improvement over the single pair 
models. 

Figure 6 shows contour plots of spatial components from the SVD based multi- 
pair model and their corresponding temporal covariances. There are clear dif- 
ferences among the temporal covariances. However, if one takes out the compo- 
nents that are either sensor dominated with very short temporal correlations 
(B2,B6 and D6) and the large 60% powerline component (Bl) the remaining 
components, which are presumably dominated by background brain activity, 
have similar temporal correlation structure. It will be interesting to see if in 
future work this feature is observer across multiple subjects and over a more 
extended temporal range. 

An observation was made in the course of this work that may support the sup- 
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position that our data does not exhibit great variability in the hidden source 
structure. The following heuristic was used to estimate one-pair Kronecker 
model for parallel comparison with the results presented in this paper. Sup- 
pose that each sensor has the same temporal covariance. Then it is possible 
to estimate the single-pair model using: 



T = 



Jj— -(e^,...E m ) 



M[LM - 1) 



M(CM - 1) 



(Ei, ...Em) 



\^Em j 



\E M J 



(13) 



where E m is an L x C single trial noise data with L number of sensors and 
C number of time points; M is the number of trials. T is normalized to leave 
the variances in S only. This is not an optimized estimator, as compared to 
the single-pair Maximum Likelihood estimator; nevertheless, we found both 
of these estimators performed similarly. Looking at the two estimators, they 
could yield similar results if the structure of the noise is not very complicated. 
Furthermore, a compatible error might be introduced to the MLE based model 
when the assumption about the Gaussian distribution of the un-averaged mea- 
sured background is violated. 



Despite almost identical performance of the multi-pair models we suggest that 
in terms of generality the model based on independent basis should be pre- 
ferred. Its power is not only in the increase of degrees of freedom and conse- 
quently the ability to describe a bigger class of possible spatial components but 
also in the discriminative criterion. The main criterion used in PCA is min- 
imization of the reconstruction error [36]. As a result this produces spatially 
uncorrelated components. This may not be a good model of the MEG/EEG 
background, especially that due to the background brain activity. It seems 
more reasonable to assume that different noise generators are independent. 
In that case, the independent basis model is complex enough to describe the 
underlying process and a correct choice of algorithm (for example an ICA 
algorithm) can estimate independent generators. 

Considering the observation that not all spatial components may have distinct 
temporal structure, a further generalization is needed. The multi-pair model 
can have an additional parameter for the number of significant components. 
In this case only significant components will contribute to the final covariance 
with their respective distinct time courses. Other components would be sup- 
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pressed. This generalization is not obvious since special care should be taken to 
retain inversion of the model. We leave this work for subsequent publications. 

Multi-pair models can in principle be directly applied to the analysis of EEG 
data. Though the topic needs further study it is expected that the differ- 
ent structure of EEG would not affect the multi-pair models as much as the 
possible added complexity of the noise. We expect that in the case of hetero- 
geneous noise sources expected in EEG signals, multi-pair models should show 
even better performance due to their increased expressive power and ability 
to incorporate information about many noise components. 

We also note that a further reduction of parameters for multi-pair models can 
be achieved by assuming stationarity of the background. This assumption is 
well supported by studies, see for example [37]. Observation of time courses 
of orthogonal and independent components for the data used in this study 
also supports this assumption. Temporal autocovariances have an Toeplitz- 
like structure and can be constructed from correlations shown in Figure 6. 
This assumption reduces the number of parameters to be estimated to L(L + 
l)/2 + LC for the SVD based model and to L 2 + LC for the ICA based model. 
Another important improvement this assumption provides is the reduction 
in running time for inverting the multi-pair approximations from 0(LC 3 ) to 
0(LC 2 ) [38]. This will be investigated in future work. 

Different models that would cover the range between the diagonal and the one- 
pair model can be developed. For such models an additional study is needed to 
investigate how spatial models alone or temporal models alone would compare 
in performance as well. These truncated models can be useful in the case when 
only a scarce amount of data is available for estimation. But when the single- 
pair model can be estimated it subsumes all lower order models and should 
not perform worse than these. As we show in this paper multi-pair models 
subsume the single-pair model and thus should not perform worse than it. 
This was demonstrated by the increased localization accuracy. In the case 
when sufficient estimation data is available and the additional computational 
load of multi-pair models does not make significant difference, we suggest using 
multi-pair models to increase accuracy. In the worst case, multi-pair models 
should not perform poorer than the one-pair model. 



5 Conclusions 

In this paper we have introduced two multi-pair spatiotemporal covariance 
models as a generalization over the previous work [2] which extends the ex- 
pressive power and provides better localization results. The orthogonal basis 
multi-pair model estimated using SVD is a more general but still a restricted 



17 



one due to the orthogonality constraints. A more general model is the inde- 
pendent basis multi-pair model estimated using an ICA algorithm. Models 
were compared on the basis of their performance in a localization algorithm. 
Performance statistics were gathered from inverse solutions to a large num- 
ber of single dipole problems with simulated sources and background from 
real MEG data. In terms of localization error, the orthogonal basis and in- 
dependent basis multi-pair models demonstrated the best performance. The 
independent basis model is a potentially better model due to the increase in 
the descriptive power. However, the errors from the single pair model exam- 
ined were not much worse. Future work will address whether these findings 
are a general feature of MEG background across multiple subjects, with more 
complex source configurations and over a more extended temporal range. 
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A Derivation of multi-pair models 



For the following derivation we need to calculate determinants of the multi- 
pair models. This is done in the Appendix C for the case of orthogonal basis 
model (C.13). Also the inverse of the orthogonal basis multi-pair model (9) is 
utilized in the following. Using these result the log likelihood for the Gaussian 
pdf for the orthogonal basis multi-pair model is expressed like: 

M L 1 ML 

C = CO nst - _5>(|T'|) - -fr(£ £ E^E^T*)" 1 ) (A.l) 

/ 1=1 / m=H=l 



Differentiating with respect to T results in: 
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M 1 M 

d£ = --tr((T l y 1 dT l ) + -tr(£E£S'E m (T i )- 1 dT'(T 1 )- 1 ) 

m=l 

/If 1 M 

= --tr((T')- 1 dT') + -tr(5:(T , )- 1 E^5 , E m (T')- 1 dT') 

m=l 

M 1 M 

= — — tr([(T') _1 - - 2 (T')- 1 E^ < S'E m (T')- 1 ]^) (A.2) 

Z iW m=l 

And the final result is: 

1 M 

T ' = 77 £ E ^' E - (A.3) 



Next we estimate temporal covariance for the independent basis multi-pair 
model. The determinant of the independent basis model is calculated in Ap- 
pendix C equation (C.14) and the inverse of this model is taken as in (11). 
The log likelihood in this case: 



M 

C = const- — (2Cln(|W- 1 |) + £ln(|T'|)) 
2 i=i 

i M L 

EEl^'^E™^) -1 ) ( A - 4 ) 

^ m=l 1=1 



Following the same path for derivation as for the orthogonal basis model case 
we end up with: 

1 M 

r ' = 77 E E^WW T ^WW T E m (A.5) 

m=l 



B Inversion of the multi-pair models 

We need the following identity for derivations of this section: 

(Ai ® Bi)(A 2 ® B 2 ) = (AiA 2 ® BiB 2 ) (B.l) 



First, lets prove that the inverse for the orthogonal basis multi-pair model is 
as in (9) . By the orthogonality of S l (S l S v = if I ^ I'), 
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cov cov 1 = (jr >H rl ® (j2 K 2 ^ 1 )' 1 ® 

= EI®(^) 2 (B.2) 
1=1 

By two properties of S l : (1) (S 1 ) 2 = S l ; (2) J2i = I due to orthogonality of 
V, we obtain the desired one: 



-l 

i=i i=i 



COV COV = ^Tl®(S z ) 2 = I®]T<S' = I®I = I. (B.3) 



Properties of S l are proved as follows: 
• property (1) 

{S l f = {v l vJ){v l vJ)=v l {vJv l )vJ 

= VivJ {vjvi = 1 by orthogonality of V) 

= S l (B.4) 



• property (2) : I = V V T = J2i v i v f 

Now we show that the inverse of the independent components multi-pair model 
is as in (11) Using the identity (B.l) 



L 

= T , (T ,, )- 1 <g> [HW^'WW 1 ]. (B.5) 
M'=i 

According to the definition of Hi: 

n t = w lW J = (W-YdefW- 1 . (B.6) 

Here {e{\l = 1, • • • ,L} are orthonormal canonical bases vectors. By the or- 
thogonality of e% and substituting (B.6) into (B.5), we obtain 

U l WW T n l 'WW T = (W- 1 ) T e i e[W- 1 WW T (W- 1 ) T e,-e^W- 1 WW T 
= (W- 1 ) T e / efe^^W T 

= 5(l,l')(W-y ei erW T (B.7) 
Finally, we obtain the desired result: 
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\i=i ) \i=i / 

= I <g> (W -1 )^ e z ef ]W T = I <g> (W-^^IW 7 = I. 



(B.8) 



C Calculation of the determinant 



Before performing actual calculations of the determinant we transfrom the 
original series form to a shape more manageable in our further reasoning. The 
only assumption needed for the transformation is that matrices S l are rank 
one outer products vivf, where v\ comes from an independent basis set. 

We first rewrite T l (g) S l in the matrix form: 



V^L rpl ql X^L rpl ql 

2^Z = 1 J 1,1'J 2^ = 1 J 1,2° 

EL rpl ql sr^L rpl ql 

Z=1 J 2,1 D 2^Z=1 J 2,2° 



EL rpl ql 
1=1 J 1,C D 

EL rpl ql 
1=1 J 2,C* D 



EL rpl ql s-^L rpl ql ST^L rpl ql 

1=1 J C,1 D 2^1=1 I C,2^ "' 2^l=l I C,C^ 



(CI) 



Considering each element separately we can see that: 



1=1 



1=1 



(C.2) 



where S is an L x L matrix with columns being v\ vectors from an orthonormal 
basis; A it j is a diagonal matrix with the elements along the diagonal being 
{TlpTfj, . . . ,T£j}. Observing (C.2) one more rewrite is possible: 



S ••• 
S ■■■ 

••• s 



Ai,i Ai, 2 

A2,l A2,2 



Ai,c 
A 2 ,c 



Ac,i Ac j2 • • • A CjC - 



S T ••• 
S T --- 

••• S T 



(C3) 



21 



The next step is actually calculating the determinant det T l <E> S l ^j : 





/ 


SO- 


• 






/ 


"Ai,i 


Ai )2 • 


• Ai )C 


\ 




/ 


V •■ 


• 


\ 


det 




OS- 


• 




det 




A 2> i 


A 2 , 2 • 


• A 2 ,c 




det 




S T - 


• 






V 


00- 


• s 


) 




V 


.Ac,i 


A C ,2 • 


• A C)C _ 


/ 




I 


0- 


• S T 


J 



(C.4) 



Let's leave diagonal matrices alone since their determinants are trivial and 
calculate the middle determinant. Let's introduce E l = eief , where ej is a basis 
vector from the canonical basis set. Now the matrix of A it j can be consequently 
rewritten as: 



i=i 



Ef=i T[ tl E l Ef=i T l l 2 E l 

EL T'l Tpl \^L T'l rpl 
l=\±2,\ tj 2^=1 L 2,2 tj 



Ef =1 Tl c E l 

EL rpl rpl 
1=1 1 2,C h 



EtiT^E 1 Ef=i T l C2 E l ■■■ Ef=i T l cc E l 



(C.5) 



By definition the determinant of a C x C matrix A is: 



det (A) =^cr(p)a! 



(C.6) 



where p is a permutation, cr(p) is the sign of a permutation and is an 
element of A. 



Because of the orthogonality of E l matrices to each other when their indices 
are not the same and using the property (E l ) n = E l : 
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det j^det (t 1 ) E l 



det (^pT l ®E^j 
det (^pT l ®E^j 
det [Y/r l ®E^j 

det {Y/r l ®E^j 



det (j2T l ®E l )=l[det (t 1 ) 
\i=i / i 



det (^«{ P )^Tl p Jl P2 ...Tl pc E^ 



det 







det (T z ) 



(C.7) 
(C.8) 
(C.9) 

(CIO) 

(C.11) 



With this result (C.ll) the determinant from (C.4) can be rewritten: 

L 

det (J2 T l ® S l ) = det (S) 2C J] det (t') (C.12) 



We first calculate a special case when 5' = vivf and all column vectors vi 
belong to an orthonormal basis, that is S from (C.12) in an orthogonal matrix. 
In this case det(5 l ) = 1 and the result is: 



det (£T l ®S l ) =[]det (r l ) 



(C.13) 



In the case when S is not orthogonal like in the ICA case, we get slightly 
different result. Redefine S l = R l , where R l = w t wf and w t is a row vector of 
the mixing matrix W~ l . Then (C.12) becomes: 



det (J2 T l ® Rl ) = det 2 ° ft det ( T 



(C14) 
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A BCD 



Fig. 6. Contour plots of the first 10 the most significant spatial components with 
corresponding temporal covariance functions (above the line) and of 2 the least 
significant spatial components. 
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